We compare the algorithms against each other first on the basis of statistical mutual agreement and consistency in the absence of true anomaly labels and subsequently on the basis of a visual interpretation. The idea is to look at the differences in detection from the two schemes across common time-frames and network entities (eg: CMTS) prioritised by anomaly intensity/score, i.e. In order of anomaly intensity from each detection scheme, which detected periods and entities are common, and conversely where are the differences.
We then plot the top-k detected time-series to have a visual representation and to discover differences between the two methods. By this process, not only would we discover strengths and weaknesses of the algorithms but we’d also anticipate the customer reaction should we move to a new algorithm.
import pandas as pd
import seaborn as sns
sns.set_palette("muted")
pd.set_option("display.width",150)
%pylab inline
%%time
np.random.seed(5)
header = ["call_date_time","call_lob_data","call_lob_voice","call_lob_video","network_level_3_name",\
"network_level_9_name","device_data_model","device_voice_model","device_video_model","ticket_id"]
calls = pd.read_csv("data/call_dump",sep="|",names=header,usecols=range(1,len(header)+1))
calls['call_date_time'] = calls['call_date_time'].convert_objects(convert_numeric=True).dropna().astype("int").astype("str")
for col in header:
calls[col] = calls[col].astype("str").map(str.strip).replace({"NULL":np.nan,"nan":np.nan},regex=True)
calls = calls.dropna(subset=['call_date_time'])
calls['date_hour'] = pd.DatetimeIndex(pd.to_datetime(calls['call_date_time'].str[:10],format="%Y%m%d%H")).tz_localize("UTC").tz_convert("US/Pacific-New")
The comcast network follows the hierarchy with a single parent per child:
We group calls by the following variables to gather hourly call-counts per level in a time-series for the entire duration:
Next, we take each time-series and score it using double-stochastic detection scheme introduced earlier. We gather Multi-dist scores using a pre-scored anomaly table, where each detected instance is graded into one of three levels from the "severity" column representing 'low','medium', and 'high' intensity anomalies.
start,end = '2016-04-01 00:00:00','2016-07-22 00:00:00'
timeframe = pd.date_range(start=start,end=end,freq="H").tz_localize("US/Pacific-New")
period = timeframe[-344:]
from ipyparallel import Client
import os
import time
os.system("nohup ipcluster start -n 24 &")
time.sleep(5)
rc = Client()
print rc.ids
dview = rc[:]
Here is the distribution of mean number of calls (X) against number of levels of the variable (Y). The below plots show that level=SYSTEM have higher call-arrival rates on average as compared to level=CMTS. Please note that only those calls have been accounted where call_is_deflected=False (i.e. the call is not deflected) and time between consecutive calls for a subscriber > 24 hours (using chain_time_lag). We observe that such filter leads to matching event-counts from the enriched_anomaly and enriched_call tables.
%%time
from score import score
series = ['network_level_3_name','network_level_9_name']
fig,axes = plt.subplots(1,2,figsize=(15,5))
i=0
out = {}
for var in series:
sup = calls.groupby(["date_hour",var])['call_date_time'].count().unstack().loc[timeframe].fillna(0)
sup.mean().hist(bins=15,ax=axes[i])
axes[i].set_xlabel("mean #calls"), axes[i].set_ylabel("#levels in %s"%var)
i+=1
out[var] = dict(zip(sup.columns,dview.map_sync(score, [sup[col] for col in sup.columns], [period]*len(sup.columns))))
Next, we extend the detection to combinations of two variables, namely device video/voice/data models and CMTS (network_level_9). Calls comprised of only the specific LOB corresponding to the device type have been considered in the counts. This gives time-series with even lower event-counts to analyse the detection sensitivity of the new scheme with data containing many zeros. Here is the distribution of mean #calls per level of the combination:
%%time
devices = ['voice','video','data']
levels = ['device_%s_model','network_level_9_name']
fig,axes = plt.subplots(1,3,figsize=(17,5))
i=0
for device in devices:
sup = calls[calls['call_lob_%s'%device]=="true"].groupby(["date_hour",levels[0]%device,levels[1]])['call_date_time'].count().unstack([1,2]).loc[timeframe].fillna(0)
sup = sup.loc[:,sup.max()>=5]
sup.mean().hist(bins=15,ax=axes[i])
axes[i].set_xlabel("mean #calls"), axes[i].set_ylabel("#levels in %s + network_level_9"%device)
i+=1
print "# of levels with no less than 5 max. calls for device type %s: %s"%(device,sup.shape[1])
out[device+"_level_9"] = dict(zip(sup.columns,dview.map_sync(score, [sup[col] for col in sup.columns], [period]*len(sup.columns))))
A ranking model's purpose is to rank, i.e. produce a permutation of items in new, unseen lists in a way which is "similar" to rankings in the training data in some sense. We choose a common duration of time (14 days: 2016-07-09 00:00 -> 2016-07-21 23:00) and rank the detections from both methods (precomputed "severity" column for multi-dist and scores computed above for double-stochastic) to find out the most anomalous time-series considering all levels in SYSTEM and CMTS from each method.
The analysis is extended to include statistical measures of comparison and develop a quantitative characterization of each method's relative performance in addition to purely qualitative visual interpretations. The following measures are chosen to compare the algorithms:
Precision & Recall: are used to measure the detection performance from double-stochastic. These are computed on the double-stochastic scores assuming all detections from multi-dist (any of severity: 1,2,3) are considered as representative of true anomalies. This gives a baseline comparison criteria in a simple yes/no fashion to quantify consistency between the two methods.
Discounted cumulative gain: (DCG) is a measure of ranking quality and is often used to measure effectiveness of web search algorithms. Using a graded relevance scale, DCG measures the usefulness (or gain) of a result based on its position. The gain is accumulated from the top of the result list to the bottom with rank discounting. The discounted CG accumulated at a particular rank position p is defined as:
$${\mathrm {DCG_{{p}}}}=\sum _{{i=1}}^{{p}}{\frac {2^{{rel_{{i}}}}-1}{\log _{{2}}(i+1)}}$$Two assumptions are made in using DCG and its related measures for evaluating anomaly detection.
* Alarms can be graded into severity levels representing true relevance (multi-dist severity).
* Severe alarms are more useful when ranked higher (from double-stochastic scores).
The cumulative gain at each position is normalized across detection levels by sorting all alarms by their relative relevance, producing the maximum possible DCG through position p, also called Ideal DCG (IDCG) through that position. The normalized discounted cumulative gain, or nDCG, is then computed as:
$${\mathrm {nDCG_{{p}}}}={\frac {DCG_{{p}}}{IDCG_{{p}}}}$$header = ["timebin","event_count","severity","rating","ranking","feature1","value1","feature2","value2","call_lob_data","call_lob_video","call_lob_voice","device_video_model","device_data_model","device_voice_model"]
prod_anomaly = pd.read_csv("data/anomaly_dump",sep="|",skiprows=3,usecols=[3,6,8,9,10,14,15,16,17,18,19,20,26,30,34],names=header)
for col in header:
prod_anomaly[col] = prod_anomaly[col].astype("str").map(str.strip).replace({"NULL":np.nan,"nan":np.nan},regex=True)
prod_anomaly = prod_anomaly.dropna(subset=['timebin'])
prod_anomaly = prod_anomaly.convert_objects(convert_numeric=True)
prod_anomaly['timebin'] = prod_anomaly['timebin'].dropna().astype("int").astype("str")
prod_anomaly['date_hour'] = pd.DatetimeIndex(pd.to_datetime(prod_anomaly['timebin'].str[:10],format="%Y%m%d%H"))
prod_anomaly['minutes'] = prod_anomaly['timebin'].str[10:12]
prod_anomaly = prod_anomaly[prod_anomaly['severity']<>"severity"]
prod_anomaly['severity'] = prod_anomaly['severity'].replace({"LOW":1,"MEDIUM":2,"HIGH":3}).astype("float64")
def dcg_score(y_true, y_score, k=10, gains="exponential"):
order = np.argsort(y_score)[::-1]
y_true = np.take(y_true, order[:k])
if gains == "exponential": gains = 2 ** y_true - 1
elif gains == "linear": gains = y_true
else: raise ValueError("Invalid gains option.")
discounts = np.log2(np.arange(len(y_true)) + 2)
return np.sum(gains / discounts)
def ndcg_score(y_true, y_score, k=10, gains="exponential"):
best = dcg_score(y_true, y_true, k, gains)
actual = dcg_score(y_true, y_score, k, gains)
return actual / best
def compute_baseline(timeseries):
seg = []
for day in range(7):
for hour in range(24):
sub = timeseries[timeseries.index.dayofweek==day]
sub = sub[sub.index.hour==hour]
seg.append(sub)
baseline = pd.concat([pd.rolling_median(sub,5).shift(1) for sub in seg]).sort_index().fillna(0)
return baseline
for key in out.keys():
out[key] = pd.concat(out[key],axis=1).sort_index()
out[key].index = out[key].index.tz_localize(None)
cut = 15
detection = {"device+level_9":[],"network_level_3_name":[],"network_level_9_name":[]}
start,end = '2016-07-09 00:00:00','2016-07-21 23:00:00'
time_range = pd.date_range(start=start,end=end,freq="H")
for device in devices:
sup = calls[calls['call_lob_%s'%device]=="true"].groupby(["date_hour",levels[0]%device,levels[1]])['call_date_time'].count().unstack([1,2]).loc[timeframe].fillna(0)
anom = []
if sup.index.tz is not None: sup.index = sup.index.tz_convert("UTC").tz_localize(None)
for time in time_range:
temp = out[device+"_level_9"].xs("double_stochastic",axis=1,level=2).loc[time].dropna()
anom.append(pd.DataFrame(temp))
detected = pd.concat(anom).stack().groupby(level=[0,1,2]).max()
prod_detected = prod_anomaly[prod_anomaly['minutes']=='00'][prod_anomaly['date_hour']>=time_range[0]][prod_anomaly['date_hour']<time_range[-1]][prod_anomaly['feature1']==str.upper(levels[0]%device)][prod_anomaly['feature2']==str.upper(levels[1])].groupby(['value1','value2','date_hour'])['severity'].max()
temp = pd.concat({"double_stochastic":detected[detected>cut],"multi_dist":prod_detected},axis=1).dropna(how="all").fillna({"double_stochastic":cut,"multi_dist":0})
temp.loc[temp['double_stochastic']>500,"double_stochastic"] = 500
temp = pd.concat({"double_stochastic":detected[detected>cut],"multi_dist":prod_detected},axis=1).dropna(how="all").fillna({"double_stochastic":cut,"multi_dist":0})
temp.index = pd.MultiIndex.from_tuples(zip(temp.index.get_level_values(0)+"_%s"%device,temp.index.get_level_values(1),temp.index.get_level_values(2)))
temp['anomaly_multi_dist'] = 0
temp.loc[temp["multi_dist"]>0,"anomaly_multi_dist"] = 1
detection["device+level_9"].append(temp)
detection['device+level_9'] = pd.concat(detection['device+level_9'])
for var in series:
anom = []
sup = calls.groupby(["date_hour",var])['call_date_time'].count().unstack().loc[timeframe].fillna(0)
if sup.index.tz is not None: sup.index = sup.index.tz_convert("UTC").tz_localize(None)
for time in time_range:
temp = out[var].xs("double_stochastic",axis=1,level=1).loc[time]
anom.append(pd.DataFrame(temp))
detected = pd.concat(anom).stack().groupby(level=[0,1]).max()
prod_detected = prod_anomaly[prod_anomaly['minutes']=='00'][prod_anomaly['date_hour']>=time_range[0]][prod_anomaly['date_hour']<time_range[-1]][prod_anomaly['feature1']==str.upper(var)][prod_anomaly['feature2'].isnull()].groupby(['value1','date_hour'])['severity'].max()
temp = pd.concat({"double_stochastic":detected[detected>cut],"multi_dist":prod_detected},axis=1).dropna(how="all").fillna({"double_stochastic":cut,"multi_dist":0})
temp.loc[temp['double_stochastic']>500,"double_stochastic"] = 500
temp['anomaly_multi_dist'] = 0
temp.loc[temp["multi_dist"]>0,"anomaly_multi_dist"] = 1
detection[var] = temp
Here is a box-plot of double-stochastic scores per level of multi-dist severity, this clearly demarcates the severity levels from an agreement/consistency point of view.
fig,axes = plt.subplots(1,3,figsize=(15,5))
fig.text(0.5, 0.01, 'severity multi_dist', ha='center', va='center'), fig.tight_layout()
i=0
for var in detection.keys():
sns.boxplot(x="multi_dist", y="double_stochastic", whis=[0.25,0.75],data=detection[var],ax=axes[i])
axes[i].legend(labels=[var]),axes[i].set(yscale="log",ylim=[10,np.percentile(detection[var]['double_stochastic'],99)],xlabel='',ylabel='')
i+=1
axes[0].set_ylabel("double_stochastic")
from sklearn.metrics import precision_recall_curve
ks = np.arange(1,11)*10
fig,axes = plt.subplots(1,2,figsize=(15,6))
dcg = {}
for var in detection.keys():
pr,re,th = precision_recall_curve(detection[var]['anomaly_multi_dist'].values,detection[var]['double_stochastic'].values)
axes[0].plot(re[pr>pr[argmax(re)]],pr[pr>pr[argmax(re)]],label=var,lw=3), axes[0].set_xlabel("Recall"), axes[0].set_ylabel("Precision")
dcg[var] = pd.Series(index=ks,data=np.array([ndcg_score(detection[var]['multi_dist'].values,detection[var]['double_stochastic'].values,k=k) for k in ks]))
pd.concat(dcg,axis=1)[detection.keys()].plot(kind="bar",ax=axes[1])
axes[1].legend(loc="upper right",bbox_to_anchor=(1.0,1.2)), axes[1].set_xlabel("top k alarms"), axes[1].set_ylabel("nDCG@k")
fig.tight_layout()
Next, in order to highlight the agreement/consistency in the two detection schemes, we train a statistical model with one as a monotonic function of the other. The underlying goal is to transform the scores into a "model" and "residual" space where each instance can be quantified in terms of its "dissimilarity" in the two methods, the advantage being that more dissimiliar instances are prioritized for further analysis.
Proportional Odds Model: is a regression model for ordinal dependent variables, for example, a choice among "poor", "fair", "good", "very good", and "excellent" given a set of predictors. It can be thought of as an extension of the binary logistic regression model allowing for more than two (ordered) response categories.
We predict the ordered multi-dist severity levels using double-stochastic scores as the predictor. In this model, the log-odds are assumed to be linear with the predictors and thus all relationships are constrained to be monotonic (linear to be precise), which is a desirable attribute for our comparative purposes. The fundamental goal is to account for the variance explained between the two methods and subsequently transform the scores into a model+residual space where overall agreement is quantified using measures like Mean Absolute Error/ per ordinal level and inconsistency is measured in the residuals.
import readline
import rpy2
%load_ext rpy2.ipython
%R require("MASS")
%R require("regr0")
%R i<-1
for key in detection.keys():
temp = detection[key][['multi_dist','double_stochastic']].copy()
%Rpush temp
%R new <- data.frame(double_stochastic=as.numeric(temp[,'double_stochastic']),multi_dist=as.factor(temp[,'multi_dist']))
%R logreg <- paste("logreg_", i, sep = "")
%R assign(logreg, polr(multi_dist~log(double_stochastic), data=new, Hess=TRUE))
%R resid_logreg <- residuals(eval(parse(text=logreg)))[,1]
%R prediction <- predict(eval(parse(text=logreg)))
%Rpull resid_logreg
%Rpull prediction
%R i=i+1
detection[key]['residuals'] = resid_logreg
detection[key]['prediction'] = prediction.astype("int64")
%%R -h 350 -w 1000
par(mfrow=c(1,3))
for(i in 1:3){
logreg <- paste("logreg_", i, sep = "")
plTA.polr(eval(parse(text=logreg)))
}
fig,axes = plt.subplots(1,3,figsize=(19,5))
i=0
for col in detection.keys():
groups = detection[col].groupby("multi_dist").groups
mae,sev = [],[]
for severity,index in groups.iteritems():
mae.append(np.abs(detection[col].loc[index]['prediction'] - severity).mean()), sev.append(severity)
axes[i].bar(sev,mae), axes[i].set(ylabel="Mean Absolute Error",xlabel="severity_multi_dist",title=col,xticks=[0,1,2,3],xticklabels=['none','low','med','high'])
i+=1
Finally, we plot the top ranked detections on their inconsistency obtained from the residual space as a result of fitting the proportional odds model. This is done for all network variables (SYSTEM, CMTS, CMTS+Device) taken individually for the chosen duration along with the baseline plot to aid visual representation and qualitative analysis. The plots are ranked based on the degree of inconsistency and can be divided into two categories, one where double-stochastic scores are relatively lower than multi-dist severity and vice-versa.
cut = 25
from IPython.display import display
cols = ['multi_dist','double_stochastic','residuals']
for col in detection.keys():
if col <> "device+level_9":
levels = detection[col].sort("residuals")[-10:].index.get_level_values(0)
for level in set(levels):
sup = calls[calls[col]==level].groupby('date_hour')['call_date_time'].count().loc[timeframe].fillna(0)
if sup.index.tz is not None: sup.index = sup.index.tz_convert("UTC").tz_localize(None)
fig = plt.figure(figsize=(15,4))
detected = detection[col].xs(level,level=0)
times_DS,intensity_DS = detected[detected['double_stochastic']>cut].sort("double_stochastic")[-5:].index, detected[detected['double_stochastic']>cut]['double_stochastic'].order()[-5:].values.tolist()
times_MD,intensity_MD = detected[detected['multi_dist']>0].index, detected[detected['multi_dist']>0]['multi_dist'].values.tolist()
sup.loc[period.tz_convert("UTC").tz_localize(None)].plot(label="observed")
compute_baseline(sup).loc[period.tz_convert("UTC").tz_localize(None)].plot(label="baseline")
plt.scatter(list(times_DS),sup.ix[times_DS].values,c='r',marker="*",s=70,alpha=0.6, label="Double-Stochastic: %s"%np.round(intensity_DS))
plt.scatter(list(times_MD),sup.ix[times_MD].values,c='g',marker="^",s=70,alpha=0.6, label="Multi-Dist: %s"%intensity_MD)
plt.suptitle(col+": "+level), plt.legend()
else:
temp = detection[col].sort("residuals")[-10:].index
levels = [tuple(mod[0].split("_")+[mod[1]]) for mod in temp]
for level in set(levels):
sup = calls[calls["device_%s_model"%level[1]]==level[0]][calls['call_lob_%s'%level[1]]=="true"][calls['network_level_9_name']==level[2]].groupby('date_hour')['call_date_time'].count().loc[timeframe].fillna(0)
if sup.index.tz is not None: sup.index = sup.index.tz_convert("UTC").tz_localize(None)
fig = plt.figure(figsize=(15,4))
detected = detection[col].xs("%s_%s"%(level[0],level[1]),level=0).xs(level[2],level=0)
times_DS,intensity_DS = detected[detected['double_stochastic']>cut].sort("double_stochastic")[-5:].index, detected[detected['double_stochastic']>cut]['double_stochastic'].order()[-5:].values.tolist()
times_MD,intensity_MD = detected[detected['multi_dist']>0].index, detected[detected['multi_dist']>0]['multi_dist'].values.tolist()
sup.loc[period.tz_convert("UTC").tz_localize(None)].plot(label="observed")
compute_baseline(sup).loc[period.tz_convert("UTC").tz_localize(None)].plot(label="baseline")
plt.scatter(list(times_DS),sup.ix[times_DS].values,c='r',marker="*",s=70,alpha=0.6, label="Double-Stochastic: %s"%np.round(intensity_DS))
plt.scatter(list(times_MD),sup.ix[times_MD].values,c='g',marker="^",s=70,alpha=0.6, label="Multi-Dist: %s"%intensity_MD)
plt.suptitle(col+": %s, %s, %s"%(level[1],level[0],level[2])), plt.legend()
print "============================ %s =========================== \n"%col
display(detection[col][cols].sort("residuals")[-10:])
The above plots depict instances where multi-dist assigns high severity (2,3) whereas double-stochastic assigns lower ratings. Per visual inspection, we observe that there exist notable differences in the relative detection performance among the most dissimilar instances. Most of these observations, although detected by both methods, have vastly different degrees of severity assigned. For instance, in multiple cases where the observed #calls ~ 10 over a baseline #calls ~ 0, multi-dist assigns a severity=3 signifying extreme outliers, while double-stochastic is conservative in flagging such observations.
for col in detection.keys():
if col <> "device+level_9":
levels = detection[col].sort("residuals")[:10].index.get_level_values(0)
for level in set(levels):
sup = calls[calls[col]==level].groupby('date_hour')['call_date_time'].count().loc[timeframe].fillna(0)
if sup.index.tz is not None: sup.index = sup.index.tz_convert("UTC").tz_localize(None)
fig = plt.figure(figsize=(15,4))
detected = detection[col].xs(level,level=0)
times_DS,intensity_DS = detected[detected['double_stochastic']>cut].index, detected[detected['double_stochastic']>cut]['double_stochastic'].values.tolist()
times_MD,intensity_MD = detected[detected['multi_dist']>0].index, detected[detected['multi_dist']>0]['multi_dist'].values.tolist()
sup.loc[period.tz_convert("UTC").tz_localize(None)].plot(label="observed")
compute_baseline(sup).loc[period.tz_convert("UTC").tz_localize(None)].plot(label="baseline")
plt.scatter(list(times_DS),sup.ix[times_DS].values,c='r',marker="*",s=70,alpha=0.6, label="Double-Stochastic: %s"%np.round(intensity_DS))
plt.scatter(list(times_MD),sup.ix[times_MD].values,c='g',marker="^",s=70,alpha=0.6, label="Multi-Dist: %s"%intensity_MD)
plt.suptitle(col+": "+level), plt.legend()
else:
temp = detection[col].sort("residuals")[:10].index
levels = [tuple(mod[0].split("_")+[mod[1]]) for mod in temp]
for level in set(levels):
sup = calls[calls["device_%s_model"%level[1]]==level[0]][calls['call_lob_%s'%level[1]]=="true"][calls['network_level_9_name']==level[2]].groupby('date_hour')['call_date_time'].count().loc[timeframe].fillna(0)
if sup.index.tz is not None: sup.index = sup.index.tz_convert("UTC").tz_localize(None)
fig = plt.figure(figsize=(15,4))
detected = detection[col].xs("%s_%s"%(level[0],level[1]),level=0).xs(level[2],level=0)
times_DS,intensity_DS = detected[detected['double_stochastic']>cut].index, detected[detected['double_stochastic']>cut]['double_stochastic'].values.tolist()
times_MD,intensity_MD = detected[detected['multi_dist']>0].index, detected[detected['multi_dist']>0]['multi_dist'].values.tolist()
sup.loc[period.tz_convert("UTC").tz_localize(None)].plot(label="observed")
compute_baseline(sup).loc[period.tz_convert("UTC").tz_localize(None)].plot(label="baseline")
plt.scatter(list(times_DS),sup.ix[times_DS].values,c='r',marker="*",s=70,alpha=0.6, label="Double-Stochastic: %s"%np.round(intensity_DS))
plt.scatter(list(times_MD),sup.ix[times_MD].values,c='g',marker="^",s=70,alpha=0.6, label="Multi-Dist: %s"%intensity_MD)
plt.suptitle(col+": %s, %s, %s"%(level[1],level[0],level[2])), plt.legend()
print "============================ %s =========================== \n"%col
display(detection[col][cols].sort("residuals")[:10])
These above plots depict instances where double-stochastic assigns high severity whereas multi-dist deems these as not so severe. It is now observed that double-stochastic is detecting multiple instances where multi-dist fails to detect anything, for example, baseline #calls ~ 5 and observed #calls ~ 30-40 are only detected by double-stochastic.
One distinctive characteristic of double-stochastic is that it assigns incrementally lower scores for the same time-series (eg. network_level_9_name = "ten04.everett.wa.seattle.comcast.net") in consecutive hours since it computes the conditional probability of extreme values given what has already been observed. This is desirable from a marginal relevance perspective.
Relevance and Marginal Relevance: whether an alarm still has distinctive usefulness after the user has looked at certain other alarms (Carbonell and Goldstein, 1998). Even if an alarm is highly relevant, its information can be completely redundant with other alarms which have already been examined. The most extreme case of this is alarms that are consecutive - a phenomenon that is actually very common. In such circumstances, marginal relevance is clearly a better measure of utility to the user. Maximizing marginal relevance requires returning alarms that exhibit diversity and novelty. Whether this is desirable from a network operator's point of view is still unanswered and more depth on this would prove helpful.
from IPython.display import HTML
HTML('''<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')